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Abstract 



We propose a bottom-up variant of Earley de- 
duction. Bottom-up deduction is preferable to top- 
down deduction because it allows incremental process- 
ing (even for head-driven grammars), it is data-driven, 
no subsumption check is needed, and preference values 
attached to lexical items can be used to guide best- 
first search. We discuss the scanning step for bottom- 
up Earley deduction and indexing schemes that help 
avoid useless deduction steps. 

1 Introduction 

Recently, there has been a lot of interest in Earley 
deduction []10| with applications to parsing and gener- 
ation @ Jl, 1- 

Earley deduction is a very attractive framwork for 
natural language processing because it has the follow- 
ing properties and applications. 

• Memoization and reuse of partial results 

• Incremental processing by addition of new items 

• Hypothetical reasoning by keeping track of de- 
pendencies between items 

• Best-first search by means of an agenda 

Like Earley's algorithm, all of these approaches op- 
erate top-down (backward chaining). The interest has 
naturally focussed on top-down methods because they 
are at least to a certain degree goal-directed. 



*This work was supported by the Deutsche Forschungs- 
gemeinschaft through the project N3 "Bidirektionale Lin- 
guistische Deduktion (BiLD)" in the Sonderforschungs- 
bereich 314 Kiinsthche Intelligenz — Wissensbasierte Sys- 
teme and by the Commission of the European Communi- 
ties through the project LRE-61-061 "Reusable Grammat- 
ical Resources." I would like to thank Giinter Neumann, 
Christer Samuelsson and Mats Wiren for comments on this 
paper. 



In this paper, we present a bottom-up variant of 
Earley deduction, which we find advantageous for the 
following reasons: 

Incrementality: Portions of an input string can be 
analysed as soon as they are produced (or gener- 
ated as soon as the what-to-say component has 
decided to verbalize them), even for grammars 
where one cannot assume that the left-corner has 
been predicted before it is scanned. 

Data-Driven Processing: Top-down algorithms are 
not well suited for processing grammatical the- 
ories like Categorial Grammar or HPSG that 
would only allow very general predictions be- 
cause they make use of general schemata instead 
of construction-specific rules. For these gram- 
mars data-driven bottom-up processing is more 
appropriate. The same is true for large-coverage 
rule-based grammars which lead to the creation 
of very many predictions. 

Subsumption Checking: Since the bottom-up algo- 
rithm does not have a prediction step, there is 
no need for the costly operation of subsumption 
checking .0 

Search Strategy: In the case where lexical entries 
have been associated with preference informa- 
tion, this information can be exploited to guide 
the heuristic search. 



2 Bottom-up Earley Deduction 

Earley deduction JhJ is based on grammars encoded as 
definite clauses. The instantiation (prediction) rule of 
top-down Earley deduction is not needed in bottom- 
up Earley deduction, because there is no prediction. 
There is only one inference rule, namely the reduction 



1 Subsumption checking may still be needed to filter out 
spurious ambiguities. 
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rule (0)0 In (|J), X, G and G are literals, is a 
(possibly empty) sequence of literals, and a is the most 
general unifier of G and G' . The leftmost literal in the 
body of a non-unit clause is always the selected literal. 



X <- G Ail 

G' *- 
a(X <- ft) 



(1) 



In principle, this rule can be applied to any pair 
of unit clauses and non-unit clauses of the program 
to derive any consequences of the program. In order 
to reduce this search space and achieve a more goal- 
directed behaviour, the rule is not applied to any pair 
of clauses, but clauses are only selected if they can 
contribute to a proof of the goal. The set of selected 
clauses is called the c/iari.|^] The selection of clauses is 
guided by a scanning step (section |2.1[ ) and indexing 
of clauses (section 2.2). 



2.1 Scanning 

The purpose of the scanning step, which corresponds 
to lexical lookup in chart parsers, is to look up base 
cases of recursive definitions to serve as a starting point 
for bottom-up processing. The scanning step selects 
clauses that can appear as leaves in the proof tree for 
a given goal G. 

Consider the following simple definition of an 
HPSG, with the recursive definition of the predicate 
sign/1.0 

sign(X) <- phrasal_sign(X) . 
sign(X) <- lexical_sign(X) . 

phrasal_sign(X & dtrs : (head_dtr : HD & 

comp_dtr:CD) ) <- 

sign(HD) , 
sign(CD) , 

principles (X,HD, CD) . 



phon : CD_Ph) <- 
sequence_union (CD_Ph , HD_Ph , X_Ph) . 

The predicate sign/1 is defined recursively and 
the base case is the predicate lexical_sign/l. But, 
clearly it is not restrictive enough to find only the pred- 
icate name of the base case for a given goal. The 
base cases must also be instantiated in order to find 
those that are useful for proving a given goal. In 
the case of parsing, the lookup of base cases (lexical 
items) will depend on the words that are present in 
the input string. This is implied by the first goal of 
the predicate principles/3, the constituent order 
principle, which determines how the PHON value of a 
constituent is constructed from the phon values of its 
daughters. In general, we assume that the constituent 
order principle makes use of a linear and non-erasing 
operation for combining strings.^ If this is the case, 
then all the words contained in the PHON value of the 
goal can have their lexical items selected as unit clauses 
to start bottom-up processing. 

For generation, an analogous condition on logical 
forms has been proposed by Shieber (ll| as the "se- 
mantic monotonicity condition," which requires that 
the logical form of every base case must subsume some 
portion of the goal's logical form. 

Base case lookup must be defined specifically for 
different grammatical theories and directions of pro- 
cessing by the predicate lookup/2, whose first argu- 
ment is the goal and whose second argument is the 
selected base case. The following clause defines the 
lookup relation for parsing with HPSG. 

7« lookup (+Goal , -BaseCase) 
lookup(sign(phon:PhonList) , 

lexical_sign (phon: [Word] & synsem:X) 
) <- 

member (Word, PhonList) , 
lexicon (Word, X) . 



principles (X,HD, CD) <- 

const ituent_order_principle (X , HD , CD) , 
head_f eature_principle(X,HD) , 



const ituent_order_principle (phon : X_Ph , 

phon:HD_Ph, 



2 This rule is called combine by Earley, and is also re- 
ferred to as the fundamental rule in the literature on 
chart parsing. 

3 The chart differs from the state of l lOl in that clauses 
in the chart are indexed (cf. section |2,2|) . 

4 We use feature terms in definite clauses in addition to 
Prolog terms, f :X denotes a feature structure where X is 
the value of feature f , and X & Y denotes the conjunction 
of the feature terms X and Y. 



Note that the base case clauses can become further 
instantiated in this step. If concatenation (of difference 
lists) is used as the operation on strings, then each base 
case clause can be instantiated with the string that 
follows it. This avoids combination of items that are 
not adjacent in the input string. 

lookup(sign(phon:PhonList) , 

lexical_sign(phon: [Word I Suf ] -Suf & 
synsem: Synsem) 

) <- 

append(_, [WordlSuf] .PhonList) , 
lexicon(Word, Synsem) . 



5 There is an obvious connection to the Linear Context- 
Free Rewriting Systems (LCFRS) Jl|, 
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In bottom-up Earley deduction, the first step to- 
wards proving a goal is perform lookup for the goal, 
and to add all the resulting (unit) clauses to the chart. 
Also, all non-unit clauses of the program, which can 
appear as internal nodes in the proof tree of the goal, 
are added to the chart. 

The scanning step achieves a certain degree of goal- 
directedness for bottom-up algorithms because only 
those clauses which can appear as leaves in the proof 
tree of the goal are added to the chart. 

2.2 Indexing 

An item in normal context-free chart parsing can be 
regarded as a pair (R,S) consisting of a dotted rule 
R and the substring S that the item covers (a pair of 
starting and ending position). The fundamental rule 
of chart parsing makes use of these string positions to 
ensure that only adjacent substrings are combined and 
that the result is the concatenation of the substrings. 

In grammar formalisms like DCG or HPSG, the 
complex nonterminals have an argument or a feature 
(phon) that represents the covered substring explic- 
itly. The combination of the substrings is explicit in 
the rules of the grammar. As a consequence, Earley 
deduction does not need to make use of st ring posi- 
tions for its clauses, as Pereira and Warren [[LO] point 
out. 

Moreover, the use of string positions known from 
chart parsing is too inflexible because it allows only 
concatenation of adjacent contiguous substrings. In 
linguistic theory, the interest has shifted from phrase 
structure rules that combine adjacent and contiguous 
constituents to 

• principle-based approaches to grammar that 
state general well-formedness conditions in- 
stead of describing particular constructions (e.g. 

HPSG) 

• operations on strings that go beyond concatena- 
tion (head wrapping pT| , tree adjoining |lqj, se- 
quence union Jl2|). 

The string positions known from chart parsing 
are also inadequate for generation, as pointed out by 
Shieber |l3) in whose generator all items go from posi- 
tion to so that any item can be combined with any 
item. 

However, the string positions are useful as an in- 
dexing of the items so that it can be easily detected 
whether their combination can contribute to a proof of 
the goal. This is especially important for a bottom-up 
algorithm which is not goal-directed like top-down pro- 
cessing. Without indexing, there are too many com- 
binations of items which are useless for a proof of the 



goal, in fact there may be infinitely many items so that 
termination problems can arise. 

For example, in an order-monotonic grammar for- 
malism that uses sequence union as the operation for 
combining strings, a combination of items would be 
useless which results in a sign in which the words are 
not in the same order as in the input string [jyj . 

We generalize the indexing scheme from chart pars- 
ing in order to allow different operations for the com- 
bination of strings. Indexing improves efficiency by 
detecting combinations that would fail anyway and by 
avoiding combinations of items that are useless for a 
proof of the goal. 

We define an item as a pair of a clause CI and an 
index Idx, written as (CI, Idx). 

Below, we give some examples of possible indexing 
schemes. Other indexing schemes can be used if they 
are needed. 

1. Non-reuse of Items: This is useful for LCFRS, 

where no word of the input string can be used 
twice in a proof, or for generation where no part 
of the goal logical form should be verbalized twice 
in a derivation. 

2. Non-adjacent combination: This indexing 

scheme is useful for order-monotonic grammars. 

3. Non-directional adjacent combination: This 

indexing is used if only adjacent constituents can 
be combined, but the order of combination is not 
prescribed (e.g. non-directional basic categorial 
grammars). 

4. Directional adjacent combination: 

This is used for grammars with a "context-free 
backbone." 

5. Free combination: Allows an item to be used 

several times in a proof, for example for 
the non-unit clauses of the program, which 
would be represented as items of the form 
(X<-Gi A... A G„, free). 

The following table summarizes the properties of 
these five combination schemes. Index 1 (II) is the 
index associated with the non-unit clause, Index 2 (12) 
is associated with the unit clause, and II * 12 is the 
result of combining the indices. 





Index 1 


Index 2 


Result 


Note 




II 


12 


11*12 




1. 


X 


Y 


XUY 


Any = 


2. 


X 


Y 


XQY 




3. 


X + Y 


Y + Z 


X + Z 






Y + Z 


X + Y 


X + Z 




4. 


X-Y 


Y - Z 


X-Z 




5. 


X 


'free' 


X 






'free' 


X 


X 
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In case 2 ("non-adjacent combination"), the in- 
dices X and Y consist of a set of string positions, and 
the operation is the union of these string positions, 
provided that no two string positions from X and Y 
do overlap. 

In (^), the reduction rule is augmented to handle 
indices. X *Y denotes the combination of the indices 
X and Y. 

{X <- G Afl, II) 

< G ^ /2 > (2) 
(a(X <- £1), 11*12) y ' 

With the use of indices, the lookup relation be- 
comes a relation between goals and items. The follow- 
ing specification of the lookup relation provides index- 
ing according to string positions as in a chart parser 
(usable for combination schemes 2, 3, and 4). 

lookup(sign(phon:PhonList) , 

item(lexical_sign(phon : [Word] & 
synsem:X) , 

Begin-End) 

) <- 

nth_member (Word , Begin , End , PhonList ) , 
lexicon(Word,X) . 

nth_member (X , , 1 , [X I _] ) . 
nth_member(X,Nl,N2, [_|R] ) <- 

nth_member(X,NO,Nl,R) , 

N2 is Nl + 1. 

2.3 Goal Types 

In constraint-based grammars there are some predi- 
cates that are not adequately dealt with by bottom-up 
Earley deduction, for example the Head Feature Prin- 
ciple and the Subcategorization Principle of HPSG. The 
Head Feature Principle just unifies two variables, so 
that it can be executed at compile time and need not 
be called as a goal at runtime. The Subcategoriza- 
tion Principle involves an operation on lists (append/3 
or delete/3 in different formalizations) that does not 
need bottom-up processing, but can better be evalu- 
ated by top-down resolution if its arguments are suf- 
ficiently instantiated. Creating and managing items 
for these proofs is too much of a computational over- 
head, and, moreover, a proof may not terminate in the 
bottom-up case because infinitely many consequences 
may be derived from the base case of a recursively de- 
fined relation. 

In order to deal with such goals, we associate the 
goals in the body of a clause with goal types. The 
goals that are relevant for bottom-up Earley deduction 
are called waiting goals because they wait until they 
are activated by a unit clause that unifies with the 



goal|] Whenever a unit clause is combined with a non- 
unit clause all goals up to the first waiting goal of the 
resulting clause are proved according to their goal type, 
and then a new clause is added whose selected goal is 
the first waiting goal. 

In the following inference rule for clauses with 
mixed goal types, H is a (possibly empty) sequence 
of goals without any waiting goals, and O is a (possi- 
bly empty) sequence of goals starting with a waiting 
goal, a is the most general unifier of G and G', and 
the substitution t is the solution which results from 
proving the sequence of goals S. 

(X <- GAS A Q,I1) 

^ /2 > (3) 

(t<t(X <- n),Il*I2) v ' 

2.4 Correctness and Completeness 

In order to show the correctness of the system, we must 
show that the scanning step only adds consequences of 
the program to the chart, and that any items derived 
by the inference rule are consequences of the program 
clauses. The former is easy to show because all clauses 
added by the scanning step are instances of program 
clauses, and the inference rule performs a resolution 
step whose correctness is well-known in logic program- 
ming. The other goal types are also proved by resolu- 
tion. 

There are two potential sources of incompleteness 
in the algorithm. One is that the scanning step may 
not add all the program clauses to the chart that are 
needed for proving a goal, and the other is that the 
indexing may prevent the derivation of a clause that is 
needed to prove the goal. 

In order to avoid incompleteness, the scanning step 
must add all program clauses that are needed for a 
proof of the goal to the chart, and the combination 
of indices may only fail for inference steps which are 
useless for a proof of the goal. That the lookup relation 
and the indexing scheme satisfy this property must be 
shown for particular grammar formalisms. 

In order to keep the search space small (and finite 
to ensure termination) the scanning step should (ide- 
ally) add only those items that are needed for proving 
the goal to the chart, and the indexing should be cho- 
sen in such a way that it excludes derived items that 

6 The other goal types are top-down goals (top-down 
depth- first search), x-corner goals (which combine bottom- 
up and top-down processing like left-corner or head-corner 
algorithms), Prolog goals (which are directly executed by 
Prolog for efficiency or side-effects), and chart goals which 
create a new, independent chart for the proof of the goal. 
Dorre || proposes a system with two goal types, namely 
trigger goals, which lead to the creation of items and other 
goals which don't. 
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are useless for a proof of the goal. 

3 Best-First Search 

For practical NL applications, it is desirable to have 
a best-first search strategy, which follows the most 
promising paths in the search space first, and finds 
preferred solutions before the less preferred ones. 

There are often situations where the criteria to 
guide the search are available only for the base cases, 
for example 

• weighted word hypotheses from a speech recog- 
nizer 

• readings for ambigous words with probabilities, 
possibly assigned by a stochastic tagger (cf. 0) 

• hypotheses for correction of string errors which 
should be delayed 

Goals and clauses are associated with preference 
values that are intended to model the degree of con- 
fidence that a particular solution is the 'correct' one. 
Unit clauses are associated with a numerical preference 
value, and non-unit clauses with a formula that deter- 
mines how its preference value is computed from the 
preference values of the goals in the body of the clause. 
Preference values can (but need not) be interpreted as 
probabilities.^ 

The preference values are the basis for giving pri- 
orities to items. For unit clauses, the priority is iden- 
tified with the preference value. For non-unit clauses, 
where the preference formula may contain uninstanti- 
ated variables, the priority is the value of the formula 
with the free variables instantiated to the highest pos- 
sible preference value (in case of an interpretation as 
probabilities: 1), so that the priority is equal to the 
maximal possible preference value for the clause. [] 

The implementation of best-first search does not 
combine new items with the chart immediately, but 
makes use of an agenda ||, on which new items are 
ordered in order of descending priority. The following 
is the algorithm for bottom-up best-first Earley deduc- 
tion. 

procedure prove( Goal): 

- initialize-agenda(GoaZ) 

- consume-agenda 

- for any item {G,I) 

- return mgu(Goal, G) as solution if it exists 

procedure initialize-agenda( Goal): 

7 For further details and examples see |^] and ||. 
8 There are also other methods for assigning priorities to 
items. 



- for every unit clause f/Cin lookup( Goal, UC) 

- create the index / for UC 

- add item (UC,I) to agenda 

- for every non-unit program clause H <— Body 

- add item (H <— Body, free) to agenda 

procedure add item / to agenda 

- compute the priority of / 

- agenda := agenda U {/} 

procedure consume-agenda 

- while agenda is not empty 

- remove item / with highest priority from agenda 

- add item / to chart 

procedure add item (C,I1) to chart 

- chart := chart U {(C,il)} 

- if C is a unit clause 

- for all items (H <- G A 5 A ft, 12) 

- if I = 12 -k II exists 

and a = mgu(C, G) exists 

and goals 3 are provable with solution r 

then add item (t<t(H <— to agenda 

- if C = H <— G AS Ail is a non-unit clause 

- for all items (G' <-, 12) 

- if I = II * 12 exists 

and a = mgu(G, G') exists 

and goals 3 are provable with solution r 

then add item (t<t(H <— H),!) to agenda 



The algorithm is parametrized with respect to 
the relation lookup/2 and the choice of the indexing 
scheme, which are specific for different grammatical 
theories and directions of processing. 



4 Implementation 

The bottom-up Earley deduction algorithm described 
here has been implemented in Quintus Prolog as part of 
the GeLD system. GeLD (Generalized Linguistic De- 
duction) is an extension of Prolog which provides typed 
feature descriptions and preference values as additions 
to the expressivity of the language, and partial eval- 
uation, top-down, head-driven, and bottom-up Earley 
deduction as processing strategies. Tests of the system 
with small grammars have shown promising results, 
and a medium-scale HPSG for German is presently be- 
ing implemented in GeLD. The lookup relation and the 
choice of an indexing scheme must be specified by the 
user of the system. 



5 



5 Conclusion and Future Work 

We have proposed bottom-up Earley deduction as a 
useful alternative to the top-down methods which re- 
quire subsumption checking and restriction to avoid 
prediction loops. 

The proposed method should be improved in two 
directions. The first is that the lookup predicate should 
not have to be specified by the user, but automatically 
inferred from the program. 

The second problem is that all non-unit clauses 
of the program are added to the chart. The addition 
of non-unit clauses should be made dependent on the 
goal and the base cases in order to go from a purely 
bottom-up algorithm to a directed algorithm that com- 
bines the advantages of top-down and bottom-up pro- 
cessing. It has been repeatedly noted J|, [lj that di- 
rected methods are more efficient than pure top-down 
or bottom-up methods. However, it is not clear how 
well the directed methods are applicable to grammars 
which do not depend on concatenation and have no 
unique 'left corner' which should be connected to the 
start symbol. 

It remains to be seen how bottom-up Earley deduc- 
tion compares with (and can be combined with) the im- 
proved top-down Earley deduction of Dorre || , John- 
son jjj and Neumann and to head-driven methods 
with well- formed substring tables jl| , and which meth- 
ods are best suited for which kinds of problems (e.g. 
parsing, generation, noisy input, incremental process- 
ing etc.). 
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